OCPBUGS-54188: Update Pod interactions with Topology Manager policies #95111

amolnar-rh · 2025-06-23T15:10:16Z

Version(s): 4.12, 4.14, 4.15, 4.16. 4.17, 4.18, 4.19, 4.20

Issue: https://issues.redhat.com/browse/OCPBUGS-54188

Link to docs preview:

QE review:

QE has approved this change.

Additional information:

openshift-ci-robot · 2025-06-23T15:10:23Z

@amolnar-rh: This pull request references Jira Issue OCPBUGS-54188, which is invalid:

expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Version(s):

Issue:

Link to docs preview:

QE review:

QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ocpdocs-previewbot · 2025-06-23T15:25:54Z

🤖 Mon Jun 23 15:25:53 - Prow CI generated the docs preview:

https://95111--ocpdocs-pr.netlify.app/openshift-enterprise/latest/post_installation_configuration/node-tasks.html
https://95111--ocpdocs-pr.netlify.app/openshift-enterprise/latest/scalability_and_performance/using-cpu-manager.html

openshift-ci · 2025-06-23T15:26:55Z

@amolnar-rh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/validate-portal	`a9adc33`	link	true	`/test validate-portal`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2025-07-14T11:56:17Z

@amolnar-rh: This pull request references Jira Issue OCPBUGS-54188, which is invalid:

expected the bug to target the "4.20.0" version, but no target version was set
expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Version(s): 4.12, 4.14, 4.15, 4.16. 4.17, 4.18, 4.19, 4.20

Issue: https://issues.redhat.com/browse/OCPBUGS-54188

Link to docs preview:

QE review:

QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ffromani

some comments inside

ffromani · 2025-07-17T09:31:04Z

modules/pod-interactions-with-topology-manager.adoc

@@ -32,9 +32,11 @@ spec:
        memory: "100Mi"
 ----

-If the selected policy is anything other than `none`, Topology Manager would not consider either of these `Pod` specifications.
+If the selected policy is anything other than `none`, Topology Manager would consider either of the `BestEffort` or the `Burstable` QoS class `Pod` specifications.


not sure here. When the topology manager policy is not None, it will indeed try to align all pods, but for pods whose QoS class is not Guaranteed, all the alignment logic will degrade in a no-operation. So, yes, we will do all the dance, but the result will be "no pinning, no alignment"

ffromani · 2025-07-17T09:31:53Z

modules/pod-interactions-with-topology-manager.adoc

@@ -32,9 +32,11 @@ spec:
        memory: "100Mi"
 ----

-If the selected policy is anything other than `none`, Topology Manager would not consider either of these `Pod` specifications.
+If the selected policy is anything other than `none`, Topology Manager would consider either of the `BestEffort` or the `Burstable` QoS class `Pod` specifications.
+When the Topology Manager policy is set to `none`, the relevant containers are pinned to any available CPU without considering NUMA affinity. This is the default behavior and does not optimize for performance-sensitive workloads.


we usually mean "pinning" as "run on a precise set of resources", so not sure the terminology is best here. "pinned to anything" is something I don't see used much, but I'm also not a native english speaker.

What about:

the relevant containers are assigned to run on any available set of CPUs...

Or should we keep it vague and instead of specifying CPU say resources?

"the relevant containers are assigned to run on any available set of CPUs..." seems fine to me

ffromani · 2025-07-17T09:32:27Z

modules/pod-interactions-with-topology-manager.adoc

-If the selected policy is anything other than `none`, Topology Manager would not consider either of these `Pod` specifications.
+If the selected policy is anything other than `none`, Topology Manager would consider either of the `BestEffort` or the `Burstable` QoS class `Pod` specifications.
+When the Topology Manager policy is set to `none`, the relevant containers are pinned to any available CPU without considering NUMA affinity. This is the default behavior and does not optimize for performance-sensitive workloads.
+Other values enable the use of topology awareness information from device plugins. The Topology Manager attempts to align the CPU, memory, and device allocations according to the topology of the node when the policy is set to other values than `none`. For more information about the available values, see _Additional resources_.


device plugins and core resources (cpu, memory)

ffromani · 2025-07-17T09:32:39Z

modules/pod-interactions-with-topology-manager.adoc

@@ -53,6 +55,6 @@ spec:
        example.com/device: "1"
 ----

-Topology Manager would consider this pod. The Topology Manager would consult the hint providers, which are CPU Manager and Device Manager, to get topology hints for the pod.
+Topology Manager would consider this pod. The Topology Manager would consult the Hint Providers, which are CPU Manager and Device Manager, to get topology hints for the pod.


CPU Manager, Device Manager and Memory Manager

ffromani · 2025-07-17T09:37:19Z

modules/topology-manager-policies.adoc

@@ -16,15 +16,12 @@ This is the default policy and does not perform any topology alignment.

 `best-effort` policy::

-For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource
-availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node.
+For each container in a pod with the `best-effort` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node.


this is technically correct but maybe too low level. The observable behavior of the best-effort policy is that the kubelet will try to align all the required resources on a NUMA node, but if the allocation is impossible (no enough resources) the allocation will spill into other NUMA nodes unpredictably. The pod will always be admitted.

I tried to rephrase it. WDYT?

Kubelet tries to align all the required resources on a NUMA node according to the preferred NUMA node affinity for that container. Even if the allocation is not possible due to insufficient resources, the Topology Manager still admits the pod but the allocation is shared with other NUMA nodes.

The only reason I'm leaving out "unpredictably" is because I feel like we'd need to explain what that means exactly.

Your rephrasing seems fine to me, thanks

ffromani · 2025-07-17T09:38:51Z

modules/topology-manager-policies.adoc

-For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource
-availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not
-preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure.
+For each container in a pod with the `restricted` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager rejects this pod from the node, resulting in a pod in a `Terminated` state with a pod admission failure.


The observable behavior here is that the kubelet will determine the theoretical minimal number of NUMA nodes that can fullfil the request, and reject the admission if the actual allocation would take more than that number of NUMA nodes; otherwise the pod will go running.

What do you mean that the "pod will go running"? Do you mean that the pod is admitted and it will run/operate?

Except for that part, I rephrased it:

kubelet determines the theoretical minimum number of NUMA nodes that can fulfill the request. If the actual allocation requires more than the that number of NUMA nodes, the Topology Manager rejects the admission, resulting in a pod in a Terminated state with a pod admission failure.

What do you mean that the "pod will go running"? Do you mean that the pod is admitted and it will run/operate?

yes, precisely.

ffromani · 2025-07-17T09:39:46Z

modules/topology-manager-policies.adoc


 `single-numa-node` policy::

-For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure.
+For each container in a pod with the `single-numa-node` topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a `Terminated` state with a pod admission failure.


The observable behavior is that the kubelet will admit the pod iff all the resources required by the pod itself can be allocated on a same NUMA node. Arguably, its the same as Restricted with minimal number of NUMA nodes = 1.

PTAL:

kubelet admits the pod if all the resources required by the pod can be allocated on the same NUMA node. If a single NUMA node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure.

OCPBUGS-54188: Update Pod interactions with Topology Manager policies

a9adc33

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 23, 2025

openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 23, 2025

ffromani reviewed Jul 17, 2025

View reviewed changes

OCPBUGS-54188: Update Pod interactions with Topology Manager policies #95111

Are you sure you want to change the base?

OCPBUGS-54188: Update Pod interactions with Topology Manager policies #95111

Uh oh!

Conversation

amolnar-rh commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 23, 2025

Uh oh!

ocpdocs-previewbot commented Jun 23, 2025

Uh oh!

openshift-ci bot commented Jun 23, 2025

Uh oh!

openshift-ci-robot commented Jul 14, 2025

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amolnar-rh Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amolnar-rh Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amolnar-rh commented Jun 23, 2025 •

edited

Loading

amolnar-rh Jul 17, 2025 •

edited

Loading

amolnar-rh Jul 17, 2025 •

edited

Loading